MA2224 Data Analysis
January 27, 2020
Despite being called Data Analysis, this class is essentially a classic probability and statistics course. What this means is that the methods taught in the class are primarily intended for controlled experiments that you might do in a lab.
Though a distinction wont often be made between experiments, where variables can be manipulated by the researcher, and observational studies, where researchers have little or no control over variables, it should be kept in mind that anything non-experimental usually needs added layers of thought in the real world.
A more modern class titled Data Analysis would probably include some data collection and cleaning, machine learning and statistical software.
As a long term guiding question for the class, you can think of our ultimate goal as finding answers to the question:
“How does what you see, compare to what you expect to see?”
Concrete examples of this would be: “You toss a coin 100 times and get 40 heads, is the coin fair?” or “You scored a 980 on the SAT is this a good?” or “there are 2 bedbugs on your subway car, is that a lot?”. All three questions have the following in common:
Typically, a researcher will be interested in determining the expected or usual value/behavior or they will be looking to see if something has deviates from this expected behavior
We spend like 70% of the course dealing with the expectation and comparison elements of this question. Then a single class reminding you how to calculate mean, median etc. and then put it all together to answer the question.
I’ll mention this idea throughout the semester and it may be useful to return to this core question if you ever feel lost and try to place the material in this framework.
To start, we need to figure out what we should expect to happen when we do things, which means we need to know how likely things are to happen, so we gotta start with probability.
If probability tries to answer the question “What will happen?” then you have to start by thinking about all the things that could happen. This is called the sample space.
The Sample Space is the set of all possible outcomes from some action. It is the collection of things that you might possibly see.
What would be the sample space of flipping a coin once?
In actuality, a sample space is somewhat subjective, but in this class we make the standard assumption in math classes which is to keep it boring.
From the examples above we can identify two types of sample spaces:
A Discrete Sample Space is a sample space with either finitely many outcomes or an “infinite list” of outcomes.
A Continuous Sample Space is a sample space with an equivalent numbers of outcomes as points in an interval of real numbers.
In the examples above, the time to finish the exam, time to pay off studen loans and height of a Tandon student are continuous. The others are distcrete.
Since we just need basic probability ideas to get into statistics we only focus on discrete sample spaces until we move onto random variables.
I’ll generally refer to the actual elements inside the sample space as outcomes to distinguish them from events (which technically they are also). When you start combining outcomes then we start talking more officially about set theory.
Take the Sample Space of some experiment to be
Take the Sample Space of some experiment to be
Take the Sample Space of some experiment to be
Throwing probability into the story is pretty easy. All you do is take your sample space outcomes and assign each a number…with some rules called axioms meaning you just accept them because you are paying to trust what I say.
The first axiom just says that you can make the probability of something anything you want so long as you stay within
The second axiom says that once you set the probabilities for all
outcomes in a sample space, you need those outcome probabilities to add
up to
The final axiom says that if you want the probability that either of
Consider the sample space